Demystifying MLOps with Vetiver

Myles Mitchell @ Jumping Rivers

Before we start…

Who am I?

  • Principal Data Scientist @ Jumping Rivers:

    • Project management.

    • Python & machine learning support for clients.

    • Teach courses in programming, SQL, ML.

    • Organise North East & Leeds data science meetups.

Jumping Rivers

↗ jumpingrivers.com   𝕏 @jumping_uk

  • Machine learning
  • Dashboard development
  • R packages and APIs
  • Data pipelines
  • Code review
     

Talk plan

  • How I got into Data Science

  • First encounter with MLOps

  • Getting to grips using Vetiver (code examples)

  • Take home lessons

How I got into Data Science

My route into Data Science

  • PhD in Astrophysics (started 2017)

  • Extra training in “Data Intensive Science”

  • … academia is hard

  • Joined Jumping Rivers full time in 2022 (following an internship)

Life as a “Data Scientist”

My initial experience:

  • Software development (check out diffify.com)

  • Course writing and teaching

  • LOTS of merge requests

  • Conferences and meetups

  • Data science is not always about machine learning!

My first encounter with MLOps

Typical data science workflow

  • Data is imported and tidied.
  • Cycle of data transformation, visualisation and modelling.
  • Results are communicated to an external audience.

From Classical Stats to Machine Learning

  • Classical statistical modelling prioritises understanding the system behind the data.
  • By contrast, machine learning tends to prioritise prediction.
  • As data grows we retrain our ML models to optimise predictive power.
  • A goal of MLOps is to streamline this cycle.

MLOps: Machine Learning Operations

  • Framework to continuously build, deploy and maintain ML models.
  • Encapsulates the “full stack” from data acquisition to model deployment.
  • Includes versioning, deployment and monitoring.
  • Sounds simple enough …

Reality

The dreaded architecture diagram…

Reality for an MLOps beginner

  • Countless permutations

    • Modelling frameworks
    • Cloud platforms
    • Environment and container managers
  • Very multidisciplinary

  • Expensive

  • Where to even begin..?

Getting to grips using Vetiver

Vetiver

  • Integrates with popular ML libraries in R and Python.
  • Fluent tooling to version, deploy and monitor a trained model.
  • Deploy to a cloud service or to the localhost.

Let’s build an MLOps stack!

Data

  • Palmer Penguins dataset:

    library("palmerpenguins")
    
    names(penguins)
    [1] "species"           "island"            "bill_length_mm"   
    [4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
    [7] "sex"               "year"             
  • Let’s predict species using flipper length, body mass and island!

Scatter plot showing positive relationship between penguin flipper length and penguin body mass. The data points are coloured based on species and shaped based on island. The Gentoo penguins tend to have higher body mass and flipper length than Adelie and Chinstrap.

Palmer Penguin dataset

Data tidying

  • Using {tidyr} and {rsample}:

    # Drop missing data
    penguins_data = tidyr::drop_na(penguins)
    
    # Split into train and test sets
    penguins_split = rsample::initial_split(
      penguins_data, prop = 0.8
    )
    train_data = rsample::training(penguins_split)
    test_data = rsample::testing(penguins_split)

Modelling

  • Let’s set up the model recipe in {tidymodels}:
library("tidymodels")

model = recipe(
  species ~ island + flipper_length_mm + body_mass_g,
  data = train_data
) |>
  workflow(nearest_neighbor(mode = "classification")) |>
  fit(train_data)

Model testing

  • Our model object can now be used to predict species:
model_pred = predict(model, test_data)

# Accuracy for unseen test data
mean(
  model_pred$.pred_class == as.character(
    test_data$species
  )
)
[1] 0.8955224

Enter Vetiver!

  • Convert our {tidymodels} model to a {vetiver} model:

    v_model = vetiver::vetiver_model(
      model,
      model_name = "k-nn",
      description = "penguin-species"
    )
    v_model
    
    ── k-nn ─ <bundled_workflow> model for deployment 
    penguin-species using 3 features
  • Contains all the info needed to version, store and deploy our model!

Model versioning

  • Use {pins} to store R or Python objects for reuse later.

  • Store pins using “boards” including Posit Connect, Amazon S3 or even Google drive!

  • Storing in a temporary directory:

    model_board = pins::board_temp(
      versioned = TRUE
    )
    model_board |>
      vetiver::vetiver_pin_write(v_model)

Retrieving a model

  • Retrieve a model

    model_board |> vetiver::vetiver_pin_read("k-nn")
    
    ── k-nn ─ <bundled_workflow> model for deployment 
    penguin-species using 3 features
  • Inspect the stored versions

    model_board |> pins::pin_versions("k-nn")
    # A tibble: 1 × 3
      version                created             hash 
      <chr>                  <dttm>              <chr>
    1 20250930T160110Z-4130e 2025-09-30 17:01:10 4130e

Model deployment

  • We deploy models as APIs which take input data and send back model predictions.

  • APIs can be hosted at public endpoints on the web.

  • We can run them on the localhost (during testing / development).

  • {vetiver} uses {plumber} to create a model API.

Deploying locally

  • {vetiver} and {plumber} support local deployment:

    plumber::pr() |>
      vetiver::vetiver_api(v_model) |>
      plumber::pr_run()
  • Query the API via a simple dashboard or the command line.

  • Great for beginners to MLOps and APIs!

Deploying to Connect

  • Vetiver integrates nicely with Posit Connect:

    vetiver::vetiver_deploy_rsconnect(
      board = model_board, "k-nn"
    )
  • Easier / quicker if pinned model is on Connect.

  • We can also publish to Amazon SageMaker using vetiver_deploy_sagemaker()

Deploying to other cloud platforms

  • Prepare a Dockerfile with:

    vetiver::vetiver_prepare_docker(
      model_board,
      "k-nn"
    )
  • Use docker build to set up the environment:

    • Installs OS and R dependencies

    • Runs the API

Model monitoring

Wrapping up

MLOps tips

  • Move from large CSVs to more efficient formats like Parquet and Arrow.
  • Version your data or SQL query commands.
  • Use environment managers like {renv} to track dependencies.
  • Your preferred cloud platform may have built in tools for model and feature selection.
  • Deploy locally before deploying to the cloud.

Cost considerations

  • Some cloud platforms offer free trials (e.g., SageMaker 2-month trial).
  • May be cheaper if you’re already invested in a particular cloud platform
    • Data services
    • App deployment
  • Costs can rise depending on computational resources consumed.
  • Model building and deployment use different environments!

Take home lessons

  • Life as a Data Scientist isn’t always about machine learning!

  • Architecture diagrams can be incredibly useful.

  • … but do consider your target audience!

  • You can get started on MLOps right now with free and open source tools.

  • Consider whether it is worth the cost/effort before investing in cloud infrastructure.

Thanks for listening!